MEDB 5502, Week 10, The dark side of data science

Topics to be covered

  • What you will learn
    • Quiz and poll questions
    • Empiricism and its critics
    • Recidivism case study
    • Search history case study
    • License plate case study
    • Personalized medicine case study
    • What’s the cause and what’s the cure

Talk given to first year medical students

  • Topic also relevant to this class.
  • Only a few minor changes
    • Different format for the “programming” assignment

Who am I?

Steve Simon

  • PhD Statistics, 1982, U Iowa
  • Teach in Biomedical and Health Informatics
    • Previous jobs at CMH, CDC
  • Part-time independent statistical consultant (P.Mean Consulting)
  • Married to a Pediatric Cardiologist (retired)
  • Run 5K and 4 mile races

Obsessed with computers since 1972

Figure 1. Section on computer skills from my resume

Worked with health care applications since 1987

  • Recent positions
    • Centers for Disease Control and Prevention (1987-1996)
    • Children’s Mercy Hospital (1996-2008)
    • UMKC School of Medicine (2008 to present)
  • But…
    • I am not a doctor
    • Still confused about many things
      • Example: Difference between good and bad cholesterol.

Quiz questions (1/3)

Why does Joel Best call statistics a social construct?

  • Statistics are misquoted often on social media.
  • Statistics are selected, shaped, and presented by human beings.
  • Statistics are used to promote socialism.
  • Statistics are dehumanizing.

Quiz questions (2/3)

What is the main philosophical foundation of empiricism?

  • Everything can be reduced to a mathematical equation.
  • Experiments can reveal the realities of the world.
  • Some questions are impossible to answer.
  • We construct our own reality based on our own lived experiences

Quiz questions (3/3)

What is a major problem with data science?

  • Data scientists rely on large amounts of data with uneven quality.
  • Models developed by data scientists can lead to loss of privacy.
  • Prediction models are a black box that can hide discriminatory intent.
  • All of the above.

First poll question

Figure 2. Quote from “Peggy Sue Got Married”

Second poll question

Figure 3. Images of various computers

Are Statisticians Gods?

I’m helping someone who wants an alternative statistical analysis to the one used by the principal investigator. I’m happy to help and will offer advice about why my approach may be better, but I was warned that the PI considers the analysis chosen to be ordained by the “Statistical Gods” at her place of work.

Break #1

  • What you have learned
    • Quiz and poll questions
  • What’s coming next
    • Empiricism and its critics

Statistics are a social construct

Figure 4. Cover from Joel Best’s book

The arrogance of empiricism (1/2)

“I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.” Lord Kelvin.

The arrogance of empiricism (2/2)

“No human investigation can be called real science if it cannot be demonstrated mathematically.” Leonardo da Vinci

Why empiricism fails (1/3)

“The government is very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn well pleases.” Sir Josiah Stamp

Why empricism fails (2/3)

Figure 5. Cover of Stephen Jay Gould’s book

Why empricism fails (3/3)

Figure 6. Frame from Yes, Prime Minister video clip

But numbers still have value. Example: Quality of Life

Figure 7. First few questions from SF-36 form

Rules for using Statistics as social constructs

  1. Understand the context in which the Statistic was generated.

  2. Identify possible biases

  3. Recognize limitations

  4. Beware of confirmation bias

  5. Avoid nihilistic thinking

Things have gotten worse, thanks to big data

  1. Large amounts of data of uneven quality

  2. Black box models

  3. Lack of accountability

  4. Scaling problems

  5. Loss of privacy

Break #2

  • What you have learned
    • Empiricism and its critics
  • What’s coming next
    • Recidivism case study

Case study evuations

  • Answer the following questions
    • Who is the villain?
    • Who is the victim?
    • How was the victim harmed?
    • What could have prevented this?
    • Did anything surprise you?
    • Did you disagree with anything in the article?
    • Is there a single quote from the article that summarizes it well?

Here are the articles for your review

Figure 8. Excerpts from three articles

Weapons of Math Destruction

Figure 9. Cover of Cathy O’Neill’s book

First villain

  • Walter Quijano
    • Provided testimony on recidivism rates at seven trials
    • Unfairly included race in his calculations and testimony
    • Six of seven convictions later overturned

Second villain

  • LSI-R questionnaire
    • Given to thousands of inmates
    • Classifies risk of recidivism
    • Does not explicitly ask about race
    • But does have “leading” questions

Victims

  • Inmates at parole hearings
  • Defendants at trial sentencing

How were the victims harmed

  • Disproportionate recidivism risks by race
    • Fewer paroles grants
    • Longer sentences
  • No avenue to appeal
    • Model presumed to be unbiased
    • Complexity prevents examination of bias
  • Scale issues
    • Walter Quijano harmed 7 defendants
    • LSI-R harmed thousands of defendants.

What could have prevented this

  • Insist on transparency
  • Test the model for bias
  • Build the model with better objective

Did anything surprise you?

  • Questionnaire includes questions that would be inadmissable if they were asked during a normal trial
    • When was the first time you were ever involved with the police?
    • Do any of your friends or relatives have a criminal record?

Did you disagree with anything in the article?

  • No

Is there are single quote that summarizes the article well

“The questionnaire includes circumstances of a criminal’s birth and upbringing, including his or her family, neighborhood, and friends. These details should not be relevant to a criminal case or to the sentencing.”

Break #3

  • What you have learned
    • Recidivism case study
  • What’s coming next
    • Search history case study

A Face Is Exposed for AOL Searcher No. 4417749

Figure 10. First page of newspaper article

Break #4

  • What you have learned
    • Search history case study
  • What’s coming next
    • License plate case study

How a ‘NULL’ License Plate Landed One Hacker in Ticket Hell

Figure 11. First page of newspaper article

Break #5

  • What you have learned
    • License plate case study
  • What’s coming next
    • Personalized medicine case study

How Bright Promise in Cancer Testing Fell Apart

Figure 12. First page of newspaper article

Break #6

  • What you have learned
    • Personalized medicine case study
  • What’s coming next
    • What’s the cause and what’s the cure

Myths about big data

  1. Algorithms are objective

  2. If you have enough data, quality is no longer an issue

  3. We are getting better at this

No single cause of these problems

  1. Wrong data

  2. Wrong objective

  3. Wrong deployment

  4. Wrong team

What they say data science is

What data science really is

Why you are needed

  1. Too many geeks, not enough scientists
  2. More racial, gender diversity is needed

If you are interested, email me: simons@umkc.edu

You can find my talk here:

https://github.com/pmean/papers-and-presentations/blob/master/dark-side/2022-talk.pptx

Summary

  • What you have learned
    • Quiz and poll questions
    • Empiricism and its critics
    • Recidivism case study
    • Search history case study
    • License plate case study
    • Personalized medicine case study
    • What’s the cause and what’s the cure

Additional topics??